102

Binary Neural Architecture Search

b

B-1

Zero

B-1

Max pooling

Aver pooling

Identity

Dwise-Conv

Dil-Conv

+

+

1-bit

Not Conv

Conv

Input

Output

FIGURE 4.7

The operations of each edge. Each edge has 4 convolutional operations, including 2 types of

binarized convolution with 33 or 55 receptive fields and 4 non-convolutional operations.

4.3.2

Search Space

We search for computation cells as the building blocks of the final architecture. As in

[305, 306, 151], we construct the network with a predefined number of cells, and each cell

is a fully connected directed acyclic graph (DAG) G with M nodes, {N1, N2, ..., NM}. For

simplicity, we assume that each cell only takes the outputs of the two previous cells as

input and each input node has pre-defined convolutional operations for preprocessing. Each

node Nj is obtained by Nj = 

i<j o(i,j)(Ni). Ni is the node dependent on Nj with the

constraints i < j to avoid cycles in a cell. We also define the nodes N1 and N0 without

input as the first two nodes of a cell. Each node is a specific tensor as a feature map, and

each directed edge (i, j) denotes an operation o(i,j)(.), which is sampled from the following

K = 8 operations:

no connection (zero)

skip connection (identity)

3 × 3 dilated convolution with rate 2

5 × 5 dilated convolution with rate 2

3 × 3 max pooling

3 × 3 average pooling

3 × 3 depth-wise separable convolution

5 × 5 depth-wise separable convolution

We replace the separable convolution in depth with a binarized form, as shown in

Fig. 4.7 and 4.8. Optimizing BNNs is more challenging than conventional CNNs [77, 199],

as binarization adds additional burdens to NAS.

Binarized dwise 3*3

channel=M

Dwise 3*3

channel=M

Conv 1*1

channel=N

Binarized conv 1*1

channel=N

Input

channel=M

Input

channel=M

BN+ReLU

BN

BN

FIGURE 4.8

Compared to the origin separable convolution in depth (left), a new binarized separable

convolution in depth is designed for CP-NAS (right).